Kernel Methods for Unsupervised Learning Kernel Methods for Unsupervised Learning Title: Kernel Methods for Unsupervised Learning
نویسنده
چکیده
Kernel Methods are algorithms that projects input data by a nonlinear mapping in a new space (Feature Space). In this thesis we have investigated Kernel Methods for Unsupervised learning, namely Kernel Methods that do not require targeted data. Two classical unsupervised learning problems using Kernel Methods have been tackled. The former is the Data Dimensionality Estimation, the latter is the Clustering. The dimensionality of a data set, called Intrinsic Dimension (ID), is the minimum number of free variables needed to represent the data without information loss. The Kernel PCA, has been investigated and compared, as ID estimator, with a classical dimensionality estimation method, namely the Principal Component Analysis (PCA). The study has been carried out both on a synthethic data set of known dimensionality and on real data benchmarks, i.e. MIT-CBCL Face database and Indoor-Outdoor Image database. The investigations have enlighted that Kernel PCA performs better than PCA, as dimensionality estimator, only when the kernel adopted is very close to the function that has generated the data set. Otherwise Kernel PCA can perform even worst than PCA. With regard to the Clustering problem, we have proposed a novel Kernel Method, the Kernel Grower [Cam04]. The Kernel Grower convergence is guaranteed, unlike several classical clustering algorithm, since it is an Expectation-Maximization algorithm. The main Kernel Grower quality consists, unlike all clustering algorithms published in the literature, in producing nonlinear separation surfaces among data. Kernel Grower compares better with popular clustering algorithms, namely K-MEANS, Neural Gas and Self Organizing Maps, on a synthethic dataset and two UCI real data benchmarks, i.e. IRIS data and Wisconsin breast cancer database. Kernel Grower is the main original result of the thesis.
منابع مشابه
Immediate Reward Reinforcement Learning for Projective Kernel Methods
We extend a reinforcement learning algorithm which has previously been shown to cluster data. We have previously applied the method to unsupervised projection methods, principal component analysis, exploratory projection pursuit and canonical correlation analysis. We now show how the same methods can be used in feature spaces to perform kernel principal component analysis and kernel canonical c...
متن کاملStatistical machine learning for data mining and collaborative multimedia retrieval
of thesis entitled: Statistical Machine Learning for Data Mining and Collaborative Multimedia Retrieval Submitted by HOI, Chu Hong (Steven) for the degree of Doctor of Philosophy at The Chinese University of Hong Kong in September 2006 Statistical machine learning techniques have been widely applied in data mining and multimedia information retrieval. While traditional methods, such as supervis...
متن کاملKernel Methods and String Kernels for Authorship Analysis
This paper presents our approach to the PAN 2012 Traditional Authorship Attribution tasks and the Sexual Predator Identification task. We approached these tasks with machine learning methods that work at the character level. More precisely, we treated texts as just sequences of symbols (strings) and used string kernels in conjunction with different kernel-based learning methods: supervised and ...
متن کاملUnsupervised Multiple Kernel Learning
Traditional multiple kernel learning (MKL) algorithms are essentially supervised learning in the sense that the kernel learning task requires the class labels of training data. However, class labels may not always be available prior to the kernel learning task in some real world scenarios, e.g., an early preprocessing step of a classification task or an unsupervised learning task such as dimens...
متن کاملUnsupervised Discretization Using Kernel Density Estimation
Discretization, defined as a set of cuts over domains of attributes, represents an important preprocessing task for numeric data analysis. Some Machine Learning algorithms require a discrete feature space but in real-world applications continuous attributes must be handled. To deal with this problem many supervised discretization methods have been proposed but little has been done to synthesize...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004